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Abstract 

We introduce a symmetrization technique that allows us to translate a problem of 
controlling the deviation of some functionals on a product space from their mean into a 
problem of controlling the deviation between two independent copies of the functional. 
As an application we give a new easy proof of Talagrand's concentration inequality 
for empirical processes, where besides symmetrization we use only Talagrand's con- 
centration inequality on the discrete cube {— 1,+1}". As another application of this 
technique we prove new Vapnik-Chervonenkis type inequalities. For example, for VC- 
classes of functions we prove a classical inequality of Vapnik and Chervonenkis only 
with normalization by the sum of variance and sample variance. 
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1 Introduction and main results. 



Let us consider a measurable space Q with probability measure n, and the corresponding 
product space (Q", Given a class of measurable functions T — {f : fl —> M.} , we consider 
a functional 

n 

Z{x) = sup 

1=1 

where x = {xi, . . . ,Xn) G which is usually called an empirical process. To avoid mea- 
surability problems we will assume that T is countable, or even finite. Our main interest is 
to study the deviation inequalities for this (or similar) functional from its mean. The main 
observation of this paper is that this problem can be translated into a problem of studying 
Z{x) — Z{y), where y lives on a separate copy of This new problem turns out to be easier, 
at least in the examples we have in mind here, as it can be handled with Talagrand's convex 
distance inequality on {— which is the simplest case of convex distance inequality 
(see Talagrand (1995)). 

As a first example of application of this technique we will give an easy proof of Tala- 
grand's concentration inequality for Z{x). As a second example, we will prove new Vapnik- 
Chervonenkis type inequalities. 

Let us start by proving the main result that will allow us to implement the mentioned 
symmetrization. For x e R we will denote {x)+ — max(a;, 0). 



Lemma 1 If^ andv are r. v. s such that for any number a e R and a function (f){x) — {x—a). 

E0(O < E0(z/) 
and for some F > 1, 7 > and for all t > 

¥(u >t)< Fe~^*, 

then for all t > 

P(C > ^) < Le^-^*. 



Proof. Let (f){x) — {x — a)+ for some a e M that will be chosen later. Note that is 
nondecreasing. For t > we can write 

where we used integration by parts. Since F > 1, we can assume that t > 7"^ Take 

a — t , (l)(x)—(x — t-\ — ^ . 

7 ^ 7^ + 
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Then (j){t) = -f-\ 0(0) = and 

roo POO 

Jo Jt-^-^ 
which gives P(^ > t) < Te^"^*. 

□ 

It is clear that the Lemma can be stated in more generahty, for instance, we could 
consider the case of tails re~^^" for a > 0. But it is irrelevant for the applications of this 
paper. The main consequence is given by the following corollary. 

Corollary 1 Let ^(x, y) : x — > i?, 1 < i < 3 be measurable functions defined on two 
copies of ft"' and let 

ax)^ f U^,y)dix^{y). 

If ^3 > and for allt>0 

//'"(6>6 + (6^)'/')<re-^*, 

then for all t > 

A^"(e;>e^ + (e^i)'/')<re^-^*- 
Proof. Since y/ab — mis>o{5a + b/{45)) we can rewrite the events 

(6 > 6 + (^3^)'/') = {sup 45(6 - 6 - '^^^3) > t} 



and, similarly. 



Let us denote 



{ei > + m'^'} = {«^p4<^(^i - ^2 - > t}. 



e = sup 45(6 - 6 - 56), = sup45(el - ^2 - SQ. 

S>0 S>0 



Clearly, 



1/ = sup / 45(6 - 6 - <^6)rfA^"(?/) < / Cdii^'iy), 

5>0 J J 

and, thus, by Jensen's inequality, for any nondecreasing convex funcion 



iiy)d^i-{x) < I H I id^i-{y)]d^i-{x) < I 0(Oci/i"(a;)ci/x"(2/). 
Lemma 1 implies the result. 



□ 



As we mentioned above, besides the symmetrization of Corollary 1 we will need Tala- 
grand's convex distance inequality, which we will formulate now. 



3 



Consider the space {0, 1}" with uniform measure Pg. If £ G {0, 1}" and A C {0, 1}", 
denote 

UA{e) = {isi)i<n G {0, 1}", 3e' eA,Si = 0^e'i = Ei}. 
Denote the "convex hull" distance between the point e and a set A as 

fc{A,e) = mf{\s\ : s G convUA{e)}, 

where \s\ denotes the Euclidean norm of s. The concentration inequality of Talagrand (The- 
orem 4.3.1 in 114.) states the following. 

Proposition 1 For any a > 

F,{f^{A,e) >t)< — ^expj ^t}. (1.1) 

Remark. In ^3] this result was formulated for a > 1, but it was proven (and used) for 
a > 0. 

The main feature of this distance is that if f^{A,e) < t, then (Theorem 4.1.2 in jl4j ) 

n n 

V(A,),<„ 3s' e A Y.^a{e[^e::)<{tY,>^lf"- (1-2) 

1=1 1=1 

We will start by giving a new proof of Talagrand's concentration inequality for empirical 
processes. 



2 Talagrand's concentration inequality for empirical 
processes. 

For simplicity of notations from now on we will write P to denote any probability measure, 
and P^ to specify the distribution on the space of random variable ^, with all other variables 
fixed. Similarly, to denote the expectation we will write E and E^. 
Let us define a mixed uniform variance as 

n 

y = E,sup^(/(a;,)-/(2/,))2. (2.1) 

In a sense, ^ is a uniform version of the sum of variance and sample variance, since in the 
case when JF consists of one function, this is exactly what it is. Clearly, \^ is a function of x. 
The following theorem holds. 

Theorem 1 Let V he defined by \2.1\) . Then for any a > 

n n 

pfsupV /(a;i) > EsupV /(a;i) + 2v^) < 2°+iexp|l —t\ 

f&^—'i ' I a + 1 J 
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and 

n n 

p(supV/(a;,) <EsupV/(a;,)-2Vyt) < 2"+i expjl - 

Remark. One can optimize the bound over a, which would give that for t > log 2, the 
bound can be written as 2exp{l — — \/log2)^}. 

Proof. We will only prove the upper tail, since the proof of the lower tail is exactly the 
same, once one switches Z and EZ. Since 

n n 

E sup V f{xi) = Ey sup V f{yi) 
Corollary 1 implies that it is enough to prove that 

n n 

P(sup > sup V /(?/,) + 2^) <2"+iexp| ^t], 

where W = supj^^p EILi(/(^i) " fiVi))^- ^^r any {x^, . . . , Xn,yi, ■ ■ ■ , ?/„), let 11 be the set of 
permutations of these coordinates such that, for each 1 < i < n, n{xi),n{yi) G {xi,yi}, and 
let denote the uniform probability measure on U. Since the above probability is invariant 
with respect to any tt G 11, it is enough to show that for any fixed x — {xi, . . . ,Xn) and 
y — {yi, . . . , yn) the probability over permutations 

n n 

pJsup 5^ > sup 5^/(^2) + 2x/m) <2"+^xp{ ^t], 

where zl = ^{xi) and = TT{yi). Note that W is invariant under permutations. We can 
rewrite it differently in terms of an i.i.d. BernouUi sequence e = (si . . . , i.e. F{ei = 0) = 
P(£i = 1) = 1/2. Namely, we can write 

f{zl) = f{yi) + e,{f{x,) - f{y,)), f{zf) = fix,) - e,{f{x,) - f{y,)), 

and instead of permutations look at the distribution P^ of e. For any / G let us denote 
c/ = T^fiVi): c'f = Y^fi^i): and fi = {f{xi) - f{yi)). Then, we need to prove that 

(n n n l/2\ 
sup(c^ + 5^£,/,) >sup(4-^£,/,)+2(t supY^fA <2"+iexp| ^t]. 

But this is an easy consequence of Proposition 1. Let us consider the functionals 

n n 

$(£) = sup (^Cf + £if^ , = sup {cj - Y ^ifi) ' 

i=l i=l 

They are both convex, with the Lipschitz norm bounded by 

1/2 



m\L,\\nL< (supY fi) 



Also, by symmetry, they have the same median, M = M($) = Af($') with respect to Pe. 
We will now show that from the convexity of $ and $' and Proposition 1 it follows 

F,ms) > M + ||$||lv^) < 2"exp| %t], (2.2) 

L a + i J 

and 

Pe($'(£) < M - WUVi) < 2°exp|--^t}. (2.3) 

Let us recall how this is usually done (see Ledoux and Talagrand (1991)). If we consider 
the set A = {e : ^{e) < M}, then P(^) > 1/2 and by convexity of $, conv^ = A. This, 
together with the Lipschitz condition, implies that 

{f^iA,e) <t}C {$(£) < M + ||$||lv^}. 

Thus, the right tail ()2.2j) follows from Proposition 1. Similarly, if we consider the set 

B={e: $'(£) < M - ll^'lliVt}, 

then 

{mB,e)<t}C{^'ie)<M}. 

By Proposition 1, 

- < F(f^(B,e) >t)< — l-exp| —t}. 

We can rewrite this as 

P(S)<2/^exp{--^t}, 

where (3 = 1/a. But since a is arbitrary, this proves the lower tail ()2.3|1 . which completes 
the proof of the theorem. 

□ 

This result is an intermediate step in obtaining the concentration inequality for Z{x) in 
its final form, since V still depends on x. Notice that here we did not assume any boundedness 
of / e and the result is of somewhat similar nature as the self-normalization phenomenon 
in the one-dimensional case (see Gine et. al. (1997), or Shao(1997)). Under the additional 
assumption that / G are uniformly bounded one can proceed by controlling the deviation 
of V (or W) from its expectation, which is done in a usual way, either via control by two 
points as in Talagrand (1996) plus some truncation argument, or via a sharp concentration 
inequality of Boucheron et. al. (2000). 

Let us assume now that 

WfeJ^yxen, -l</(x)<i. 

If we introduce Vi = sup^gjc- ^ fiyj) Y then, it is easy to see that 

n 

0<V-V,<1 and ^(r - Vi) < V. 

i=l 
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Under these conditions, Theorem 6 in Boucheron et. al. (2000) states that for all t >0, 

¥(V >EV + <expi^-EVh(^:^^Y (2.4) 

where h{x) = (1 + x) log(l + x) — x. Since h{x) > x^ /{2 + 22;/3), (12 .411 implies Bernstein's 
inequahty 

wiv >&/ + t] < expj — ^ 

V - / - 2EV^ + 2t/3i' 

which can be equivalently written as 

¥{y >EV+ ^{imVt + t2)i/2 + < e"*. 

More generally, if —h < f{x) < b, then 

2b .,oo.i/o ibH^ 



/ Zh 

'[V >KV + —{ISKVt + AbHy^ + 



3 ' '3 
Combining this with Theorem 1 we get the following corollary. 

Corollary 2 If —b < f(x) < b then for all t > log 2, 

2b, _ ,oo.i/o 462txxi/2 



'(|Z-EZ| >2(t(Er+y(18EW + 4&V)i/2 + ^)) ) < 4ei-(^--^)' + e-*. (2.5) 



It is clear, that in the range of parameters 1 <^ t -C EV/b"^, the bound of the Corollary 
will be dominated by the term ~ 2v^EV^. For this range, it improves upon the control of the 
lower tail given by Theorem 12 in Massart (2000), which states 

f(^Z <EZ - 2Vl.35EVt - S.bbtj < e"*. (2.6) 

Actually, one can check that 

2[t\m + —{\mVt + 462t2)i/2 ^ j < 2Vl.35EVt + 3.56t 

for all parameters 6, EV,t. Unfortunately, ()2.5p and ()2.6p are not comparable in all range of 
parameters, mainly, because of the term exp{ — (^/t — v^log 2)^}. 
Finally, for more results in this 



7 



3 Vapnik-Chervonenkis type inequalities. 



In this section we are trying to control the functional Qnf uniformly over the class J-", where 

QJ = Pf-Pnf or QJ = PJ-Pf 

and 



f 1 

Pf= f{x)dPix), P„/ = -^/(x,). 

^ ^ i=l 



The difference from the previous section is that now the bounds on Qnf will depend on / 
and will reflect that the function / with a smaller variance should have a tighter bound. 
The results of this section are in a spirit of Vapnik and Chervonenkis (1968) and Panchenko 
(2002). 

Corresponding to Qnf, let us introduce 

^"/ = -E(/(2/.)-/(x.)) or 5„/ = -^(/(x,)-/(y,)). 
Finally, we define 



n — ' n 

1=1 j=i 



1 

^nf = - y2^iifiy^) - fi^i)), 



n 
1=1 



Wf = W{f, x,y) = - V(/(y.) - f{x,)f, Vf = V{f, x) = EyWif, x, y). 



n 
1=1 



As one of the consequences of our approach we will give a uniform control of Qnf /{V fY^"^ 
for VC-subgraph classes of functions. The original result of Vapnik and Chervonenkis [T7] 
provided a uniform control for Qnf / {P fY^"^ for VC-classes of functions taking values / G 
{0,1} (and a simple generalization for VC-major classes taking values in [0,1]). The fact 
that we can substitute Pf by V f gives a new way to control Qnf- 

Let us introduce a function $(/, x, ?/) which is invariant over all permutations of (x, y) 
that switch only the same coordinates of x and y. Assume that for some fixed j3 G (0, 1) and 
for any fixed (x, y) we have 



(sup(i?„/ - $(/, X, y)) > O) < 1 - (3. (3.1) 



■ e 



Then the following theorem holds. 

Theorem 2 Assume that \3.1\) holds. Then for any t > log/?~^, 

P(3/ G ^ Qnf > £,<!>(/, x,i/) + < exp(l - (Vt - ^\ogp-^f). 
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Proof. We will first prove that for any a > the statement of the theorem holds with the 
right hand side substituted by /3~"exp(l — + The result will follow by optimization 
over a. First of all, by Corollary 1 it is enough to prove that 



^(3/^„/>$(/,x,i/) + J— )<-exp( 



Wt\ 1 / a , 



Since ^{f,x,y) is invariant under permutations of and yi. we can write, 



(3/ SJ- > $(/, x,y) + ^^)= P(3/ > $(/, X, y) + ^^) 



= EP, (3/ RJ > $(/, x,y) + ^^y (3.2) 

For a fixed (x, y) consider a set 

A={e:snp{Rnf-Hf,x,y))<0}. 

By condition dSH), ^siA) > (3. If we denote At = {e : f^iA^) < t} then jrH) implies that 

Pe(A)>l-r"exp( ^t). 

Let us take e & At and e' G The definition of A implies that for any / G 

1 " 

1=1 

and, therefore, 

n ^ — ^ n ^ — ^ 

n 

j=i 

But since e E At, ()1.2p implies that one can choose e' G ^ so that 

2 / 4 \ 1/2 /Wf\^/'^ 

n — ^ V — ^ 

1=1 i=l 

This proves the theorem. 



Let us consider a special case of $(/, x,?/), which satisfies condition (j3.ip . Let us note 
here that application of Talagrand's concentration inequality for two point space as it was 
implemented in Theorem 2 is not crucial for the examples of this section. It is well known 
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fact that the chaining technique that we will only use here to bound the (1 — /?)-quantile 
implies tail estimates as well. But it is hard to argue with the fact that the application of 
Talagrand's inequality even for these examples is more elegant as it immediately provides 
the tail estimates once the bound for the quantile is obtained. 

We will assume from now on that = / G JF. Let d he a metric on JF. Given -u > we 
say that a subset J-'' C is m— separated if for any / 7^ G JF' we have d{f,g) > u. Let a 
packing number D{J^, u, d) be the maximal cardinahty of a m— separated set. 

We define 

/■vW/2 

$(/,a;,y) = Kn-^'^ / {\ogD{r ,u,d^,y)f/Hu, 
Jo 

where 

/I ^ \l/2 

g) = 2^ifiyi) - f{xi) - g{yi) + gixi) fj 

i=l 

and K = K{P) depends only on p. For example, if K{P) = 8{p + 2)^/^, where p is such that 
^°^2 < 1 ~ then the following theorem holds. 

Theorem 3 If K{i3) is defined as above then holds. 

Proof. The proof is based on standard chaining technique. Let us fix {x,y). Define 

F = {{fiVi) - fix,), . . . , fivn) - f{xn)) : / G 
and ^ 

dif^9)={li2^f^-9^?Y\ f,9eF. 

1=1 

Then, if 

rd(f,0) 

$(/) = K{P)n-'/^ / {log D{F,u,d)Y/'du, 
Jo 

we need to prove that 

1 " 

(sup (- V ej, - $(/)) >0)<l-p. 

Let Jo be defined as 

jo = inf{j:D(F,2-^rf)>2}. 
Consider an increasing sequence of sets 

{0} = = . . . = C F^, C C . . . 

such that for any g ^ h & Fj, d{g, h) > 2^^ and for all / G -F there exists g G Fj such that 
d{f,g) < 2~-^. The cardinality of Fj can be bounded by 

\Fj\ < D{F,2-^,d). 
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For simplicity of notations we will write D{u) :— D{F,u,d). If D{2^^) = D{2^^^^) then in 
the construction of the sequence {Fj) we will set Fj equal to -Fj+i- We will now define the 

sequence of projections nj : F ^ Fj, j > in the following way. If / G F is such that 
d{f, 0) e i2~^^\ 2~^] then set no{f) = . . . = 7rj{f) = and for A; > j + 1 choose TVkif) E Fu 
such that d{f,7rk{f)) < 2"^ In the case when Fk = Fk+i we will choose 7rfc(/) = 7rk+i{f). 
This construction implies that d{'Kk-i{f)-,'^k{f)) < 2"'^+^. Let us introduce a sequence of sets 

^j^{g-h:gE Fj, h e d{g, h) < 2-^+^}, j > jo, 

and let = {0} if D{2-^) = D{2-^+^). The cardinality of A^- does not exceed 

|A,|<|F,f <D(2-^y. 

By construction any f & F can be represented as a sum of elements from A^ 

j>jo 

Let 

Ij = n-^/^ j {log D{u)y/^du 

and define the event 



A = M|sup -y2eJ,>KlA. 

On the complement A'^ of the event A we have for any f E F such that d{f, 0) G {2~^~^, 2~^] 

f n ^ n 

n ^ — ' n ^ — ' ^ — ' ^ — ' 

i=l k>j+l 1=1 k>j+l 

2-^-1 d(/,0) 

<Xn-^/' J {logD{u)f'''du<Kn-^/^ j {log D{u)f'''du. 



It remains to prove that for some constant K{P), P{A) < 1 — /3. Indeed, 

oo 1 

P{A) < V P ( sup - V eji > KI,) 

3=30 ^ «=1 

< E |A,|exp{-^}/(D(2-) > D(2->^)) 

< Y,exp[2logD{2-^) - ^z^}liD{2-^) > D{2-^+')), 

3=30 
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since for f & Aj n ^ Yl'i=i ft — 2 The fact that D{u) is decreasing imphes 



'■J 



2-(i+i) 

and, therefore, 



>(logD(2-^))i/2 



PiA) < ^exp{-logD(2-^)(i^22-6 - 2)}J(D(2-^) > D(2-^+^)) 

j=jo 

i=io ^ ^ i=2 
for p = K{Pf2-^ - 2 big enough. We used the fact that L>(2--'") > 2. 



Example (Uniform entropy conditions). Let us introduce a uniform packing numbers -D(JF, u) 
as any function such that 

supD(^,?i,L2(Q)) < Z^(^,n) 
Q 

where the supremum is taken over all discrete probability measures. One can easily check 
that 



i=l i=l 

and, therefore, in the case when the packing numbers are bounded uniformly we get, 

< D{J^,u/2). 

Hence, 



1/2 



/•V W I A 

E,$(/, X, y) < K(/3)n-i/% / (log u/2)fl^du 

Jo 

rVv/i 

<2K{(3)n-^'^ {\ogD{T,u)fl^du. 
Jo 



Corollary 3 For any t > log/? ^, 

P(3/ e ^ g„/ > j^'\\ogD{T,u)f"du + < exp(l - (v^- v/toir^)^). 

□ 

In the case of VC-subgraph classes with VC dimension d (for definition, see van der 
Vaart and Wellner (1996)), the result of [H] gives 

D{T, n)<e(rf+l)(^)', 
and, therefore, the following corollary. 
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Corollary 4 (Normalization by variance). There exists K that depends only on (3 such that 
for any t > log 

f(3/ . ^ ^ > + < expd - ( V* - 



Let us rewrite V as 

V = V{x) = 4(Var/ + Var,/ + {Pf - PJY) = 4(Var/ + Var„/ + {Qjf 

where 



1 

Var„/ = -V(P„/-/(x,))^ 
is a sample variance. If we denote 



n 

i=l 



\ n \ n 

then one can solve the inequality of Corollary 4 for Qn/ to get 

p(3/ G ^ |g„/| > 2^( ^'''/_\^T"0 '^') - - (v^- 0^)^). 

Let us compare this to an "optimistic" inequality of Vapnik and Chernonenkis 
which states that if JF = {/ : ^ {0, 1}} is a VC-class of indicator functions with VC 
dimension d, then with probability at least 1 — e^*/^, for all / G 



n(P/) 



1=1 



Compared to the inequality of Vapnik and Chervonenkis our inequality controls the deviation 
of Pnf from Pf in both directions, no assumptions are made on the boundedness of functions 
f E J-", and the deviation is controled by the mixture of variance and sample variance rather 
than by expectation Pf, which can be considered as a significant improvement. 

Example (The case of one function). When consists of one function / we will simply 
write f{X) = ^. Let us take P = 1/2 and let 



1 " 

i=l 



Obviously, with this choice of f3 and $ condition (jH.lll holds and Theorem 2 implies 

P(^|e-Ee| > 2(^^ ^ ) ) < 2exp(^l- (Vt- v/k^)2j. 
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Solving the inequality for |^ — E^| we get 

> 2((M±^)"') < 2exp(l - (VJ- v^)^). (3.3) 

One should compare this to Bernstein type inequalities. First of all, we don't assume any 
moment conditions other than the existance of variance of ^. Second, in Bernstein's inequality 

\^-E^\ < [^^) fort<nVare, 

l^-E^I < - for t > nVar e, 

n 

whereas ()3.3|) gives 

|f_E{|<2(HiX?£i±^)'^or*<n/8. 

This, basically, means that the deviation of the average ^ from the expectation can be 
large only when the sample variance is large. 

Acknowledgment. We want to thank Michel Talagrand for some valuable comments 
and suggestions. 
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